class: title-slide <br> <br> # A Data Mining Approach for Detecting Collusion in Unproctored Online Exams<br> .padding_left.pull-down.white[ J. Langerbein, T. Massing, .bold[_J. Klenke_], M. Striewe, M. Goedicke, C. Hanck, N. Reckmann <br> <br> <br> `\(15^{th}\)` International Conference on Educational Data Mining Bangalore, 11-24 July, 2023 ] --- <h1> Outline </h1> `\(\quad\)` 1. [Introduction](#introduction) 1. [Related work](#related_work) 1. [Methodology](#methodology) 1. [Empirical Results](#empirical_results) 1. [Discussion](#discussion) 1. [References](#references) --- name: introduction <h1> Introduction </h1> * COVID-19 forced universities to switch to online classes and exams. * Proctoring online exams with video conference software was often prohibited due to data protection regulations and economically unfeasible. * In this case study take-home exams were conducted as open-book, but collaboration was strictly prohibited. * Hierarchical clustering algorithms were used to identify groups of potentially colluding students. * The method successfully found groups with nearly identical exams. * A proctored comparison group helped categorize student groups as "outstandingly similar". --- name: related_work <h1> Related work </h1> * Limited research exists on unproctored exams at universities prior to the pandemic. * <a href='#bib-cleophas2021s'>Cleophas et al. (2021)</a> propose a method using event logs to detect collusion in unproctored exams. * Previous studies focused on similarity measures for programming exams based on keyboard patterns, e.g. <a href='#bib-Hellas_2017'>Hellas et al. (2017)</a> and <a href='#bib-Leinonen_2016'>Leinonen et al. (2016)</a>. * Other literature (e.g. <a href='#bib-hemming2010online'>Hemming (2010)</a>) relies on surveys or interviews, lacking actual student behavior data on collusion. * Some studies suggest that unsupervised online exams may lead to collusion. * <a href='#bib-hollister2009proctored'>Hollister and Berenson (2009)</a> used GPA and final exam scores to analyze collusion but not data collected during the exam. --- name: methodology <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> * Data for the study was collected from the *Descriptive Statistics* course at the University Duisburg-Essen, Germany. * The test group took the unproctored exam at home during the COVID-19 pandemic, while the comparison group took a proctored exam in class before the pandemic. * The exams consisted of arithmetical problems, programming tasks in `R`, and a short essay task. * Event logs captured students' activities and time stamps during the exams, and points achieved per task were recorded. * Data cleaning was conducted, removing students with minimal participation or achievement, as well as those with reported internet problems. * Despite differences in exam format, both groups shared similar content and learning goals, with opportunities for questions and discussions. --- <h1> Methodology — <span style="font-size: 0.8em;"> Data set </span> </h1> <br> <br> <html> <style> table { border-collapse: collapse; } table, th, td { text-align: center; } th, td { border-bottom: 1px solid black; border-right: 1px solid black; } th:first-child, td:first-child { border-right: 2px solid black; } tr:first-child th, tr:first-child td { border-bottom: 2px solid black; } </style> <body> <h2> </h2> <table style="width:100%"> <tr> <th> </th> <th>Comparison group</th> <th>Test group</th> </tr> <tr> <th>Year</th> <td>2018/2019</td> <td>2020/2021</td> </tr> <tr> <th>N</th> <td>109</td> <td>151</td> </tr> <tr> <th>Style</th> <td>proctored (in class)</td> <td>unproctored (at home)</td> </tr> <tr> <th>Total points</th> <td>60</td> <td>60</td> </tr> <tr> <th>Sub tasks</th> <td>19</td> <td>17</td> </tr> <tr> <th>Minutes</th> <td>60</td> <td>60</td> </tr> </table> </body> </html> --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> * Agglomerative (bottom-up) hierarchical clustering algorithm * Global pairwise dissimilarities `$$D(x_i, x_{i'}) = \frac{1}{h} \sum_{j=1}^h w_j \cdot d_j(x_{ij}, x_{i'j}) \quad with \quad \sum_{j=1}^h w_j = 1$$` * `\(D(x_i, x_{i'})\)`: Global pairwise dissimilarity * `\(d_j(x_{ij}, x_{i'j})\)`: Pairwise attribute dissimilarity * `\(i = 1, ..., N\)` with `\(N = 151\)` students * `\(j = 1, ..., h\)` attributes * We compared two different kinds of attributes: * Dissimilarities in the student´s event patters (time of submission) * Dissimilarities in points achieved --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> ### Dissimilarities in the student´s event patters (time of submission) * `\(d_j^L(v_{ij}, v_{i'j})\)` with weights `\(w_j^L\)` * We divided the examination into `\(m = 1, ... , 70\)` intervals, since both exams took `\(70\)` min. * `\(v_{ijm}\)` denotes the count of answers of student `\(i\)` during the `\(m\)`-th interval. * Manhatten metric used for calculation of the pairwise attribute dissimilarity. `\(\quad\)` `$$d_j^L(v_{ij}, v_{i'j}) = \sum_{m=1}^{K=70} | v_{ijm} - v_{i'jm} |$$` --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> ### Dissimilarities in points achieved * `\(d_j^P(s_{ij}, s_{i'j})\)` with weights `\(w_j^P\)` * `\(s_{ij}\)` denotes the points achieved by student `\(i\)` in the `\(j\)`-th sub task. * Absolute difference used as dissimilarity measure. `\(\quad\)` `$$d_j^P(s_{ij}, s_{i'j}) = | s_{ij} - s_{i'j} |$$` --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> ### Full model `$$D(s_i, s_{i'}, v_i, v_{i'}) = \frac{1}{h} \sum_{j=1}^h (w_j^P \cdot d_j^P (s_{ij}, s_{i'j}) + w_j^L \cdot d_j^L (v_{ij}, v_{i'j})) \quad \text{with} \quad \sum_{j=1}^h w_j^P + w_j^L =1$$` * Weights `\(w_j\)` control the influence each attribute on the global object dissimilarity. * We reduced the weights for: * `R`-tasks and free-text questions, since the event log might not be comparable in these cases * Points achieved * Since dissimilarity measures depend on scale, the attributes were normalized. --- <h2>Empirical results — <span style="font-size: 0.8em;">Dendogram</span></h2> .panelset.sideways[ .panel[.panel-name[Control group] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_control.png" alt="<strong> Figure 1: </strong> Dendogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity" width="150%" /> <p class="caption"><strong> Figure 1: </strong> Dendogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity</p> </div> ] .panel[.panel-name[Test group] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_test.png" alt="<strong>Figure 2:</strong> Dendogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity." width="100%" /> <p class="caption"><strong>Figure 2:</strong> Dendogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity.</p> </div> ] ] <h3 style="margin-bottom: -15px;">Results</h3> <p style="margin-top: 0; font-size: 70%;"> <ul> <li>The control group has an overall higher level of dissimilarity and doesn´t contain any strikingly similar cluster. The six lowest cluster from the test group stand out in terms of similarity, specially cluster <strong>A</strong>, <strong>B</strong> and <strong>E</strong>.</li> </ul> </p> --- <h2>Empirical results — <span style="font-size: 0.8em;">Distribution of measured distances </span> </h2> <br> <br> .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_original.png" alt="<strong>Figure 3.1:</strong> Comparison of the non-normalised distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.1:</strong> Comparison of the non-normalised distance measures.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_comparison.png" alt="<strong>Figure 3.2:</strong> Comparison of the normalised distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.2:</strong> Comparison of the normalised distance measures.</p> </div> ] --- <h2>Empirical results — <span style="font-size: 0.8em;">Cluster comparison</span></h2> .panelset[ .panel[.panel-name[AB] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ab.png" alt="<strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of the clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of the clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[CD] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_cd.png" alt="<strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of the clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of the clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[EF] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ef.png" alt="<strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of the clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of the clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[GH] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_gh.png" alt="<strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of the clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of the clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[IJ] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ij.png" alt="<strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of the clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of the clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[KL] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_kl.png" alt="<strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of the clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of the clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] ] --- <h1> Discussion </h1> * The results of hierarchical clustering algorithms are presented in a dendrogram, providing a visual representation of the clustering results. * A dendrogram resembles a tree structure, where objects are merged based on their dissimilarity in a bottom-up approach. * Various hierarchical clustering algorithms exist, and the cophenetic correlation coefficient is used to assess how well each algorithm represents the original structure in the data. * Average linkage clustering is deemed the most suitable algorithm for the analysis. * The dendrogram shows compact clusters at medium dissimilarities, with three notable clusters (**A**, **B**, and **E**) consisting of two students each, indicating the absence of collusion in larger groups. * Scatterplots and barcharts are used to examine the similarity of students' chronology and achieved points within clusters. --- name: discussion <h1> Discussion </h1> * Comparison with the results from the comparison group supports the findings, indicating that collusion over the entire exam is unlikely, and the differences between the groups are not coincidental. * The method successfully detects at least three clusters with near identical exams. * The approach provides a basis for further examination of clusters based on comparison with a reference group, but the ground truth is not known, limiting the certainty of conclusions. * Nevertheless, the elevated risk of detection may indeed discourage students from cheating in unproctored exams. * This is not only a important step in adapting to the progressing digitization of education, but it also equips us better for unforeseen situations in the future, much like the COVID-19 pandemic. --- <h1> Further research </h1> * Exploring the long-term effectiveness of the detection method in deterring students from colluding in exams, and its impact on academic integrity and student behavior. * Development and implementation of methods to collect and analyze complementary evidence, with the aim of improving detection rates and understanding the extent of collusion among students. --- name: references # References .font80[ Cleophas, C., C. Hoennige, F. Meisel, and P. Meyer (2021). "Who's Cheating? Mining Patterns of Collusion from Text and Events in Online Exams". In: _Mining Patterns of Collusion from Text and Events in Online Exams (April 12, 2021)_. Hellas, A., J. Leinonen, and P. Ihantola (2017). _Plagiarism in Take-Home Exams: Help-Seeking, Collaboration, and Systematic Cheating_. ITiCSE '17. Bologna, Italy: Association for Computing Machinery, p. 238–243. ISBN: 9781450347044. DOI: 10.1145/3059009.3059065. <https://doi.org/10.1145/3059009.3059065>. Hemming, A. (2010). "Online tests and exams: lower standards or improved learning?" In: _The Law Teacher_ 44.3, pp. 283-308. Hollister, K. K. and M. L. Berenson (2009). "Proctored versus unproctored online exams: Studying the impact of exam environment on student performance". In: _Decision Sciences Journal of Innovative Education_ 7.1, pp. 271-294. Leinonen, J., K. Longi, A. Klami, A. Ahadi, and A. Vihavainen (2016). _Typing patterns and authentication in practical programming exams_ , pp. 160-165. ]